Co-evolution of Selection and Influence in Social Networks 



Yoon-Sik Cho and Greg Ver Steeg and Aram Galstyan 

USC Information Sciences Institute 
Marina del Rey, California 90292 



Abstract 

Many networks are complex dynamical systems, where 
both attributes of nodes and topology of the network 
(link structure) can change with time. We propose a 
model of co-evolving networks where both node at- 
tributes and network structure evolve under mutual in- 
fluence. Specifically, we consider a mixed membership 
stochastic blockmodel, where the probability of observ- 
ing a link between two nodes depends on their current 
membership vectors, while those membership vectors 
themselves evolve in the presence of a link between 
the nodes. Thus, the network is shaped by the interac- 
tion of stochastic processes describing the nodes, while 
the processes themselves are influenced by the changing 
network structure. We derive an efficient variational in- 
ference procedure for our model, and validate the model 
on both synthetic and real-world data. 

Introduction 

The recent surge in online social media has made it possible 
to examine social networks at an unprecedented scale. Thus, 
it is important to have scalable approaches for modeling and 
understanding statistical and dynamical properties of such 
systems. Most real-world networks are inherently complex 
dynamical systems, where both attributes of the nodes and 
topology of the network can change with time. Furthermore, 
those changes are often intertwined with each other, provid- 
ing complex feedback mechanisms between node and link 
dynamics. As an illustrative example of such an interplay in 
social networks, here we will focus on the processes of se- 
lection and influence. The former means that nodes tend to 
interact with similar nodes, whereas the latter asserts that the 
evolution of a node's attributes are affected by its neighbors. 

The problem of properly characterizing selection and in- 
fluence has been a subject of extensive studies in sociology. 
For instance, (Steglich, Snij ders, and Pearson 2010] ) sug- 
gested a continuous time agent-based model of network co- 
evolution. In this model, each agent is characterized by a 
certain utility function that depends on the agent's individ- 
ual attributes as well as his/her local neighborhood in the 
network. The agents evolve as continuous-time Markovian 
processes which, at randomly chosen time points, select an 



action to maximize their utility. Despite its intuitive appeal, 
a serious shortcoming of this model is that it cannot han- 
dle missing data well, thus most of the attributes have to be 



fully observable. This was addressed in ( |Fan and Shelton 
|2009| l where a continuous Dynamic Bayesian approach was 
developed. 

Continuous-time models have certain advantages when 
the network observations are infrequent and well-separated 
in time. In situations where more fine-grained data is avail- 
able, however, discrete-time models are more suitable ( |Han-| 
neke, Fu, and Xing 2010[ ). Here we suggest a discrete time 



dynamical network model that accounts for both selection 
and influence. Our model model is based on Mixed Mem- 
bership Blockmodel ( |Airoldi et al. 2008| >. MMSBs are an 
extension of stochastic block-models that have been stud- 
ied extensively both in social sciences and in computer sci- 



ence (Holland, Laskey, and Leinhardt 1983 |Goldenberg et| 
al. 2010| i. In a stochastic blockmodel each node is assigned 
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to a block (or a role), and the pattern of interactions between 
different nodes depends only on their block assignment. 
Many situations, however, are better described by multi- 
faceted interactions, where nodes can bear multiple latent 
roles that influence their relationships to others. MMSB ac- 
counts for such "mixed" interactions, by allowing each node 
to have a probability distribution over roles, and by making 
the interactions role-dependent ( Airol dTet al. 2008) . 

Our Co-evolving Mixed Membership Stochastic Block- 
model, or CMMSB, provides a dynamic generalization of 
the mixed membership model by explicitly modeling the 
variation in the node membership vectors. Previously, a dy- 
namic extension of the MMSB (dMMSB) was suggested 
in ( |Fu, Song, and Xing 2009] l. In contrast to dMMSB, where 
the dynamics was imposed externally, our model assumes 
that the membership evolution is driven by the interactions 
between the nodes through a parametrized influence mech- 
anism. At the same time, the patterns of those interactions 
themselves change due to the evolution of the node mem- 
berships. 

Another advantage of our model over dMMSB is that 
the latter models the aggregate dynamics, e.g., the mean of 
the logistic normal distribution from which the membership 
vectors are sampled. CMMSB, however, models each node's 
trajectory separately, thus providing better flexibility for de- 
scribing system dynamics. Of course, more flexibility comes 



at a higher computational cost, as CMMSB tracks the trajec- 
tories of all nodes individually. This additional cost, how- 
ever, can be well justified in scenarios when the system as 
a whole is almost static (e.g., no shift in the mean member- 
ship vector), but different subsystems experience dynamic 
changes. One such scenario that deals with political polar- 
ization in the U.S. Senate is presented in our experimental 
results section. 

Co-evolving Mixed Membership Blockmodel 

Consider a set of N nodes, each of which can have K dif- 
ferent roles, and let 7?* be the mixed membership vector of 
node p at time t. Let Y t be the network formed by those 
nodes at time t: Y t (p, q) = 1 if the nodes p and q are con- 
nected at time t, and Y t (p, q) = otherwise. Further, let 
Yq-.t — {Yq, Yit • - • j Yt} be a time sequence of such net- 
works. The generative process that induces this sequence is 
described below. 

• For each node p at time t = 0, employ a logistic normal 
distribution over a simplex sample, jj 



where C(p) = log(^ fc exp(/ifc)) is a normalization con- 
stant, and a , A are prior mean, and covariance matrix. 

• For each node p at time t > 0, the mean of each normal 
distribution is updated due to influence from the neigh- 
bors at its previous step: 

& P = (1 - Pp)^ 1 + /^Asfet-i) 
where ps(p,t-i) is average of weighted membership vec- 
tor jl-s of the nodes which node p has met at time t — 1 

Ms(p,i-i) = J2 q Y{p, q)w t p ^ q fl q 

(3 p describes how easily the node p is influenced by its 
neighbors. The membership vector at time t is 
<fc = ex P(Mp, fc - C(g v )), tf v ~ Af(dp, E M ) 
where the covariance E M accounts for noise in the evolu- 
tion process. 

• For each pair of nodes p, q at time t, sample role indicator 
vectors from multinomial distributions: 

~ Mult(z\ify), 5* ~ Mult{z\K q ) 
Here z p ^ q is a unit indicator vector of dimension K, so 
that Zp-^q^k — 1 means node p undertakes role k while 
interacting with q. 

• Sample a link between p and q as a Bernoulli trial: 

Y t (p,q) ~ Bernoulli(y\(l - p)z t p ^ q B t zp^ q ) 
where B is a K x K role-compatibility matrix, so that 
B* s describes the likelihood of interaction between two 
nodes in roles r and s at time t. When B* is diagonal, 
the only possible interactions are among the nodes in the 
same role. Also, p is a parameter that accounts for the 
sparsity of the network. 

Thus, the coupling between dynamics of different nodes is 
introduced by allowing the role vector of a node to be in- 
fluenced by the role vectors of its neighbors. To benefit from 

'We found that the logistic normal form of the membership vec- 
tor suggested in i jFu, Song, and Xing 2009) led to more tractable 
equations compared to the Dirichlet distribution. 



computational simplicity, we updated 7? by changing its as- 
sociated p,. This update of fl is a linear combination of pZ 
at its current state, and the values of its neighbors. The in- 
fluence is measured by a node-specific parameter j3 p , and 
Wp^q. P p describes how easily the node p is influenced 
by its neighbors: f3 p = means it is not influenced at all, 
whereas /3 P = 1 means the behavior is solely determined by 
the neighbors. Conversely, w p< _ q reflects the influence that 
node q exerts on node p, so that larger values correspond to 
more influence. 

Inference and Learning 

Under the Co-Evolving MMSB, the joint probability of the 
data Y ;T and the latent variables {P\ : N,zp->. q '■ P,q € 
N, zp^ q : p,q £ N} can be written in the following fac- 
tored form. To simplify the notation, we define zp as a pair 

of zp^ q , and zp^ q 

p(Y 0:T , ft T Nl Z% T , Z°^ T \d, A, B, fip, w P ^ q , E„) = 

n n p ( r *(p< Bt ) p (%,M> %) 

t p,q 

x P(p P +1 \p%, p SM , Y U (3 P ) [J P(p° p \d, A) (1) 

p 

In Equation[T] the term describing the dynamics of the mem- 
bership vector is defined as follow^] 

P(pl\p p -\^ t) ,^,Y t ,(3p) 

= fG(fi-fb(%-\i%£, t) ),x t .) (2) 



f G (x, E M ) = 



(27T)V2|E M |l/2 



W a (3) 



fm r\ %v = (1 - /r K + PpPsM (4) 

Performing exact inference and learning under this model 
is not feasible. Thus, one needs to resort to approximate 
techniques. Here we use a variational EM (Beal and Ghahra- 



mani 2003) |Xing, Jordan, and Russell 2003 1 approach. The 
main idea behind variational methods is to posit a simpler 
distribution q(X) over the latent variables with free (varia- 
tional) parameters, and then fit those parameters so that the 
distribution is close to the true posterior in KL divergence. 

D KL {q\\p)= I q(X)logJ^l-dX (5) 
Jx P{X, Y) 

Here we introduce the following factorized variational dis- 
tribution: 



$,$)=n?i($i7p,s*) 



x nte 2 (4-nl$->«)3*(4«-gl&-«)) ^ 
p, i,t 

where qi is the normal distribution, and 52 is the multino- 
mial distribution, and *ft, E* , 4>p^. q , <?L-g are the variational 



"For simplicity, we will assume E M is a diagonal matrix. 



Algorithm 1 Variational EM 



Input: data Y t (p, q), size N, T, K 
Initialize all {7}*, {cr}* 

Start with an initial guess for the model parameters, 
repeat 
repeat 

for t = to T do 
repeat 

Initialize <t> P ^ q , <f>pi- q to for aU g, h 
repeat 

Update all {(/>}* 
until convergence of {</>}* 
Find {a}\ {7}* 
Update all 
until convergence in time t 
end for 

until convergence across all time steps 
Update hyper parameters, 
until convergence in hyper parameters 



parameters. Intuitively, 4> P ^ q g is the probability of node p 
undertaking the role g in an interaction with node q at time 
t, and </>p«_g h i s defined similarly. Note that in the E-step, 
we need to compute the expected value of log[^ fc exp(/tfc)] 
under the variational distribution, which is problematic. To- 
ward this end, we introduce N additional variational param- 
eters C, and replace the expectation of the log by its upper 
bound induced from the first-order Taylor expansion: 



lo sE ex P(A*fe)] < !ogC - 1 + -z ^2 exp(^ A 



(7) 



The variational EM algorithm works by iterating between 
the E-step of calculating the expectation value using the 
variational distribution, and the M-step of updating the 
model (hyper)parameters so that the data likelihood is lo- 
cally maximized. The pseudo-code is shown in Algorithm[T] 
and the details of the calculations are discussed below. 

Variational E-step 

In the variational E-step, we minimize the KL distance over 
the variational parameters. Taking the derivative of KL di- 
vergence with respect to each variational parameter and set- 
ting it to zero, we obtain a set of equations that can be solved 
via iterative or other numerical techniques. For instance, 
the variational parameters (0p_>.„, 4> P ^ q ), corresponding to a 
pair of nodes (p, q) at time t, can be found via the following 
iterative scheme: 

exp(7p, fl ) 

x Y[(B(g, h) Yt< - p <i\l - B(g, fe)) 1 - 1 "****))^..* (8) 

h 

x Y[(B(g,h) Yt( P' q \l ~ B{g,K)) 1 - Y ^^f t v^ (9) 



In the above equations, <p P ^. q g and <fi P ^ q h are normalized 
after each update. Note also that Eqs. [8] and [9] are coupled 



with each other as well as with the parameters -f P g , 7* h . 

For the variational parameters S*, we have for the diago- 
nal components (a pl ,a p2 , ■ ■■(J t pk ): 



A. 



l + (l-f3 p ) 2 +J2 Y t(P:<l)P, 



+2^(iV-l)^ex P (7*, fc + %^),(10) 

where r/k is the diagonal component of the covariance ma- 
trix E M . Similarly, we obtain equations for the variational 
parameters 7-s. Generally, those equations are different for 
7p g> 7jg' an d Ip g' < * < T. Since those equations are 
too cumbersome, here we simply note that their general form 
is: 

Thus, the parameter 7* depends on its past and future values, 
Tp^ 1 and 7* +1 , as well as the parameters of its neighbors. 
Finally, for the variational parameters £ we have 

„t 2 



) 



(12) 



Note that the above equations can be solved via simple it- 
erative update as before. To expedite convergence, however, 
we combine the iterations with Newton-Raphson method, 
where we solve for individual parameters while keeping the 
others fixed, and then repeat this process until all the param- 
eters have converged. 

Variational M step 

The M-step in the EM algorithm computes the parameters 
by maximizing the expected log-likelihood found in the E- 
step. The model parameters in our case are: £?*, the role- 
compatibility matrix, the covariance matrix S M , (3 P for each 
node, w pi _ q for each pair, a, and A from the prior. 

If we assume that the time variation of the block compat- 
ibility matrix is small compared to the evolution of the node 
attributes, we can neglect the time dependence in B, and use 
its average across time, which yields: 

T,p, q ,t Y t(p> 



B(g,h) 



y 



q,t Y P^q-.9 Y p<-q,h 



(13) 



Likewise, for the update of diagonal components of the noise 
covariance matrix S„, 



N(T - 1) 



p,t 

(14) 

Similar equations are obtained for f3 p and w p< _ q . The update 
equation of f3 p and w p< _ q is a function of 7 and a which are 
related to the transition for specific node p. Since these equa- 
tions are rather involved, they will be provided elsewhere. 

The priors of the model can be expressed in closed form 
as below: 



a 



N ^ 

p 



(15) 



-Y 

N ^ 



(ilk 



(16) 



Results 



Experiments on Synthetic Data 

We tested our model by generating a sequence of networks 
according to the process described above, for 50 nodes, and 
K = 3 latent roles across T = 8 time steps. We use a co- 
variance matrix of A = 31, and mean d?° having homoge- 
neous values for the prior, so that initially nodes have a well 
defined role (i.e., the membership vector is peaked around 
a single role). More precisely, the majority of nodes had 
around 90% of membership probability mass centered at a 
specific role, and on average a third of those nodes will have 
90% on role k. For the role-compatibility matrix, we gave 
high weight at the diagonal. 

Starting from some initial parameter estimates, we per- 
formed variational EM and obtained re-estimated parame- 
ters which were very close to the original values (ground 
truth). With those learned parameters, we inferred the hid- 
den trajectory of agents as given by their mixed membership 



vector for each time step. The results are shown in Fig 1(a) 



where, for three nodes, we plot the projection of trajectories 
onto the simplex. One can see that for all three nodes, the 
inferred trajectories are very close to the actual ones. 

Comparison with dMMSB 

As a further verification of our results, we compare the per- 
formance of our inference method to the dynamic mixed 
membership stochastic blockmodel (dMMSB)( |Fu, Song,| 
|and Xing~2 009). We use synthetic data generated in a man- 
ner similar to the previous section. This time, though, for 
simplicity we keep K = 2 and we set all the /3's to some 
constant for all the nodes, j3 — 0.1 in one trial and j3 = 0.2 
in the other. In this case, we compare performance by eval- 
uating the distance in L2 norm between actual and inferred 
mixed membership vectors for each method. At each time 
step, we calculate the average over all nodes of the L2 dis- 
tance from the actual membership vector. 



As shown in Fig. 1(b) and 1(c) CMMSB captures the dy 



namics better than the dMMSB. This is due to the fact that 
our model tracks all of the nodes individually (internal dy- 
namics), while dMMSB regards the dynamism as an evolu- 
tion of the environment (external dynamics). Here, we have 
only included results for relatively small and homogeneous 
dynamics. In fact, we noticed that our method tends to fare 
even better as we increase the degree of dynamics or the het- 
erogeneity of dynamics across nodes (node-varying values 
of /?). We believe heterogeneous dynamics is more prevalent 
in real systems, and so we expect our method to outperform 
dMMSB even more than is indicated by Fig 1(c) 



US Senate Co-Sponsorship Network 

We have also performed some preliminary experiments for 
testing our model against real-world data. In particular, we 
used senate co-sponsorship networks from the 97th to the 
104th senate, by considering each senate as a separate time 
point in the dynamics. There were 43 senators who remained 
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Figure 1: (a) Actual and inferred mixed membership trajec- 
tories on a simplex, (b) Inference error for dMMSB and 
CMMSB for synthetic data generated with K — 2 and 
(3 = 0.1 for all the nodes (c) when /3 = 0.2 for all the nodes 



part of the senate during this period. For any pair of sen- 
ators (p, q) in a given senate, we generated a directed link 
p — >• q if p co-sponsored at least 3 bills that q originally 
sponsored. The threshold of 3 bills was chosen to avoid hav- 
ing too dense of a network. With this data, we wanted to 
test (a) to what extent senators tend to follow others who 
share their political views (i.e., conservative vs. liberal) and 
(b) whether some senators change their political creed more 
easily than others. 

The number of roles K = 2 was chosen to reflect the 
mostly bi-polar nature of the US Senate. The susceptibil- 
ity of senator p to influence is measured by the correspond- 
ing parameter f3 p , which is learned using the EM algorithm. 
High (3 means that a senator tends to change his/her role 
more easily. Likewise, the power of influence of senator q 
on senator p is measured by the parameter w t p< __ q 



P^ q , where 

wt^ n , > wL g2 means senator qi is more influential on 



"p<-qi 



senator p than senator q^. Here the direction of the arrow 
reflects the direction of the influence which is opposite to 
the direction of link. To initialize the EM procedure, we as- 
signed the same f), and w to all the senators, and start with a 
matrix which is weighted at the diagonal for B. 

Another method for validation is to compare the degree 
of influence. Our model handles, and learns, the degree of 
influence in the update equation. Sorting out influential sen- 
ators is an area of active research. Recently, KNOWLEGIS 
has been ranking US senators based on various criteria, in- 
cluding influence, since 2005. Since our data was extracted 
from the 97th senate to the 104th senate, direct comparison 
of the rankings was impossible. Another study ( M aisel 20 1 0) 
ranked the 10 most influential senators in both parties who 
have been elected since 1955. We compared our top 5 in- 
fluential senators, and we were able to find 3 senators (Sen. 
Byrd, Sen. Thurmond, and Sen. Dole) in the list. 

Interpreting Results 

The role-compatibility matrix learned from the Variational 
EM has high values on the diagonal confirming our intuition 
that interaction is indeed more likely between senators that 
share the same role. Furthermore, the learned values of j3 
showed that senators varied in their "susceptibility". In par- 
ticular, Sen. Arlen Spector was found to be the most influ- 
enceable one, while Sen. Dole was found to be one of the 
most inert ones. Note that while there are no direct ways of 
estimating the "dynamism" of senators, our results seem to 
agree with our intuition about both senators (e.g., Sen. Spec- 
tor switched parties in 2009 while Dole became his party's 
candidate for President in 1996). 

To get some independent verification, we compared our 
results to the yearly ratings that ACU (American Conser- 
vative Union), and ADA(Americans for Democratic Action) 
assign to senators^] ACU/ADA rated every senator based on 
selected votes which they believed to have a clear ideologi- 
cal distinction, so that high scores in ACU mean that they are 
truly conservative, while lower score in ACU suggests they 
are liberal, and for ADA vice versa. To compare the rating 
with our predictions (given by the membership vector) we 
scaled the former to get scores in the range [0,1]. 

Fig. [2] shows the relationship between these scores and 
our mixed membership vector score, confirming our inter- 
pretation of the two roles in our model as corresponding to 
liberal/conservative. Although those values cannot be used 
for quantitative agreement, we found that at least qualita- 
tively, the inferred trajectories agree reasonably well with 
the ACU/ADA ratings. This agreement is rather remark- 
able since the ACU/ADA scores are based on selected votes 
rather than co-sponsorship network as in our data. 

Of course, we are most interested in correctly identifying 
the dynamics for each senator. We compare our inferred tra- 
jectory of the most dynamic senator, and the inert senator to 
the scores of ACU, and ADA. In Fig(3]the scores of ADA 
have been flipped, so that we can compare all of the scores 
in the same measurement. However, since ACU/ADA scores 
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Figure 2: Correlation between ACU/ADA scores and in- 
ferred probabilities. 



are rated for every senator each year, the dynamics of infer- 
ence, and the dynamics of ACU/ADA scores cannot be com- 
pared one to one. Not all senators showed high correlation 
of the trend like senator Specter, and Dole. 
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Figure 3: Comparison of inference results with ACU and 
ADA scores: Sen. Specter (top) and Sen. Dole (bottom). 



Polarization Dynamics 




Figure 4: Polarization trends during 97th- 104th US Con- 
gresses. 

The yearly ACU/ADA scores give a good comparison of 
the relative political position of senators scored in each year. 
However, they are not very appropriate for comparison be- 
tween years, a point illustrated by the fact that the score 
is based on voting records for different bills in each year. 
Therefore, for validation of the dynamics we turn to an- 
other scoring system highly regarded by political scientists 
and used to observe historical trends, the DW-NOMINATE 



score. For the time period of our study, (McCarty, Poole, 
and Rosenthal 2006) shows that the political polarization 
of the senate was increasing. In particular, they show that 
the gap between the average DW-NOMINATE score of Re- 
publicans and Democrats is monotonically increasing, as we 
show in Fig. |4] In fact, the polarization for the entire senate 
was stronger every year. This is due to the unbalanced seats 
in the entire senate. In other words, our data had 22 Republi- 
can, and 21 Democratic, while for the entire senate, majority 
out numbered minority by around 10 seats. For comparison, 
for each time step we took the average of our inferred score 
for the 14 most and least conservative senators. As we show 
in Fig. |4j our inferred result agrees qualitatively with the re- 
sults of ( McCarty, Poole, and Rosenthal 2006} , showing an 
increase in polarization for every senate in the studied time- 
window. Since the DW-NOMINATE scores uses its own 
metric, and our polarization is measured by the difference 
between upper average and lower average probability, we 
should not expect to get quantitative agreement. We would 
like to highlight, however, that the direction of the trend is 
correctly predicted for each of the eight terms. 

Conclusion 

We have presented the Co-evolving Mixed Membership 
Blockmodel for modeling inter-coupled node and link dy- 
namics in networks. We used a variational EM approach 
for learning and inference with CMMSB, and were able to 
reproduce the hidden dynamics for synthetically generated 
data, both qualitatively and quantitatively. We also tested our 
model using the US Senate bill co-sponsorship data, and ob- 
tained reasonable results in our experiments. In particular, 
CMMSB was able to detect increasing polarization in the 
Senate as reported by other sources that analyze individual 
voting records of the senators. As a future work, we intend 
to test our model against different real-world data, such as 



co-authorship network of publications. We also plan to ex- 
tend CMMSB in several ways. For instance, a bottleneck of 
the current model is that it explicitly considers links between 
all the pairs of nodes, resulting in a quadratic complexity in 
the network size. Most real world networks, however, are 
sparse, which is not accounted for in the current approach. 
Introducing sparsity into the model would greatly enhance 
its efficiency. 
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